3 research outputs found

    Information Processing in a Cognitive Model of NLP

    Get PDF
    A model of the cognitive process of natural language processing has been developed using the formalism of generalized nets. Following this stage-simulating model, the treatment of information inevitably includes phases, which require joint operations in two knowledge spaces – language and semantics. In order to examine and formalize the relations between the language and the semantic levels of treatment, the language is presented as an information system, conceived on the bases of human cognitive resources, semantic primitives, semantic operators and language rules and data. This approach is applied for modeling a specific grammatical rule – the secondary predication in Russian. Grammatical rules of the language space are expressed as operators in the semantic space. Examples from the linguistics domain are treated and several conclusions for the semantics of the modeled rule are made. The results of applying the information system approach to the language turn up to be consistent with the stages of treatment modeled with the generalized net

    Lojbanic English, An Interlingua for Parallel Machine Translation

    No full text
    We investigated machine translation using the interlingua Lojban, and our own extensions, i.e., Lojbanic English. Lojban avoids ambiguity because of its 1342 primitive predicates, and no polysemy. Lojbanic English was tested for a wide variety of sentence types—yielding 503 Lojban/Lojbanic English sentence tests. We developed a translator from English to Lojbanic English, using our 35 generic sentence patterns. Ambiguity was avoided, but unforeseen patterns were yet to be considered. We also investigated other anomalies that Lojban would be (mostly) able to avoid—grammatical usage errors by Swan. We implemented 80 of most common 130 errors. The test suite was The Brown corpus consisting of 55889 sentences. Our system detected 35 true positives distributed among 15 of Swan’s rules. A low true positive rate, 35/55889, had been expected. No false positives were detected. When writing in Lojban one adds new predicates incrementally; this is very time consuming, To address Lojban's insufficient vocabulary, we developed an interactive algorithm which will recast WordNet synonym sets’ definitions into existing Lojban primitive predicates. The output is in terms of our Lojbanic English. If a relevant subset, e.g., 1/10, of the unique synonym-set definitions—totaling 116718, are converted into Lojban predicates, then 1.945 man years would be required for this effort. To avoid unforeseen syntactic patterns or implicit semantics, our Lojbanic English is such that the user writes only in terms of (Lojban) structure words, named entities and Lojbanic English predicates. Off-line, English phrases of the input sentence are mapped into Lojbanic English, e.g., linking clauses or phrases. Sufficient generality is employed to allow for reuse of the English phrases. At runtime, given an arbitrary English sentence, any errors in the final Lojbanic English generated are detected and corrected. Swan usage errors are avoided. User's skill is required. This specification of Lojbanic English gives rise to a bijective function—English words can be automatically replaced with foreign language counter parts, in parallel. We are able to translate Lojbanic English into some Lojbanic foreign language, and back (via Lojban); the result is identical to the original text. Thus, we are completely confident in the translation

    Using Database Tables and a Non-standard Neural Network Model, for Internal Cognitive Representations 1. Language as an Information System

    No full text
    One of the primary goals of a human language is to assure the information exchange between individuals. Information, residing as internal cognitive representation of the individual H1 is presented as language-coded information, communicated to another individual H2, and interpreted to internal cognitive representation of H2 (figure 1). The ‘internal cognitive representation ’ is considered as related to a semantic description of the world. As the information transmission relates two internal representations, it could be thought that the language communication builds a “semantic channel ” (Figure 1). To conceive an information system (IS), designers use the representation “input- treatment block – output”. On its input, an IS receives data and resources and on its output- obtains informative products. The treatment block functions on the bases of a particular model, which includes a number of rules and operators on data. Data emerge from a data-source, external for the system. For the correct functioning of the IS, it is essential to guarantee a permanent link between data-sources and their data-images on the system’s input. That requires categorization of data and their storage in separate data-containers, in a non-redundant way. IS engineers apply semantic modeling in order to present the data-source as a cybernetic system and, on this bases, to build a structure of data-containers, matching the model of the source. This approach is well-known in the IS domain (see for example Codd, 1979). input Resources – Cognitive resources Data – coded internal representation H1 Treatment block Language Model and method for dataprocessing. Rules and operators. Treatment – uses cognitive resources. categorization of data Organized storage Semantic channel internal representation of Language Information Syste
    corecore